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1  ntroduction 

This  project  centers  on  creating  a  molecular  framework  of  DCIS  (ductal  carcinoma  in 
situ).  DCIS  is  considered  to  be  the  precursor  to  Invasive  Ductal  Carcinoma  (I DC),  the 
most  common  form  of  breast  cancer.  I  DC  accounts  for  80%  of  all  breast  cancers, 
predominantly  affecting  women  aged  55  and  older;  however,  at  least  a  third  of  women 
with  I  DC  are  diagnosed  before  they  reach  55. 

Utilizing  a  unique  bank  of  frozen  mammary  biopsies,  containing  samples  with  DCIS 
alone,  and  a  combination  of  DCIS  and  I  DC,  we  have  started  to  profile  both  DCIS  and 
related  tissue  components.  It  is  our  aim  to  sample  the  -300  biopsies,  and  compare 
both  by  RNA  seq,  and  whole  genome  amplification,  DCIS  lesions,  within,  and  between 
patients,  and  see  how  these  may  be  correlated  with  I  DC  lesions.  We  also  intend  to  look 
for  changes  in  the  stroma  between  those  patients  that  present  with  I  DC  and  those  that 
do  not.  This  work  aims  to  identify  characteristics  that  may  be  suggestive  of  a  patients' 
likelihood  of  progressing  from  DCIS  to  I  DC,  with  the  purpose  of  reducing  the  need  for 
over  treatment  for  this  disease. 

2 

Keywords 
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3. 

Accomplishments 

Aim  1.  The  evolution  of  PCI  S. 

Task  1.  Sample  collection  and  annotation 

Annotation  on  the  tissue  bank  is  on  going 

Task  2.  Sample  choice  from  frozen  bank. 

We  have  received  146  samples  from  the  frozen  bank  now  and  have  processed  115  of 
these  so  far.  These  include  pure  DCIS  and  also  mixed  DCIS  and  I  DC  samples.  We  have 
selected  samples  that  had  5  or  more  DCIS  legions  for  this  Aim  as  these  will  be  more 
informative  for  looking  at  the  evolution  of  DCIS.  We  are  using  both  CNV  profiles  and 
SNPs  (variants  are  called  using  the  stromal  tissue). 


Task  3.  Laser  capture  of  frozen  samples  for  characterization 


From  each  of  the  115  samples  we  have  dissected  material  for  DNA,  however  we  have 
material  from  18  patients  for  characterization  (based  on  having  5  or  more  DCIS 
legions).  We  have  selected  DCIS  legions,  IDC  regions,  normal  epithelium,  where 
present,  Athypical  epithelium,  Solid  DCIS,  papillary  DCIS,  benign  epithelium,  and  stroma 
(as  far  away  from  DCI S  or  I  DC  regions  as  possible).  The  table  below  represents  the 
distribution  across  patients,  with  a  total  of  214  legions  including  the  normal  and  variants 
of  epithelium. 


Number  of  DCIS  legions 

Number  of  samples 

Number  of  samples 
with  IDC 

Number  of  IDC  legions 
per  sample 

5 

4 

2 

4,4 

6 

7 

3 

8,2,4 

7 

6 

3 

5,7,5 

8 

1 

0 

9 

1 

1 

5 

10 

1 

1 

6 

11 
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0 

12 

0 

0 

13 

1 

0 

Total 

144 

10 

50 

Task  4.  Exome  capture  and  sequencing 

We  initiated  work  on  this  using  the  Nextera  Exome  Capture  kit,  however  on  the  couple 
of  samples  we  used,  this  did  not  prove  successful,  as  there  was  a  very  low  distribution 
of  probes  represented.  Having  investigated  the  costs  and  what  is  needed  to  get  deep 
enough  coverage  for  accurately  calling  CNVs  and  SNVs,  we  decided  to  make  use  of  the 
X10  sequencing  machine  at  the  NYGC  and  do  whole  genome  sequencing  instead.  A  trial 
run  with  this  demonstrated  that  the  Whole  Genome  Sequencing  kit  that  we  were  using 
(and  other  kits  on  the  market)  was  only  compatible  with  sequencing  machines  after  the 
DNA  had  been  sheared  (resulting  in  removal  of  end  primers).  This  was  not  efficient  with 
sequencing  on  the  X10,  as  reads  are  generally  longer  and  shearing  would  result  in  very 
short  reads.  We  therefore  established  a  new  protocol,  where  by  we  enzymatically 
chewed  the  primers  off  the  ends  of  DNA  strands  after  amplification  with  the  WGA  kit, 
this  then  allowed  us  to  attach  the  primers  for  sequencing  (this  was  somehow  hindered 
without  removal  of  the  WGA  primers).  This  pipeline  proved  very  effective  and  in 
addition  to  the  18  patients  we  selected  for  the  evolution  study,  we  have  also  sequenced 
an  additional  40  patients,  making  a  total  of  59  patients.  This  amounts  to  410  DNA  X10 
libraries,  comprising  of  81  IDC,  201  DCIS,  69  stroma  and  the  remainder  of  normal 
epithelium,  atypia  and  benign  epithelium. 


Task  5.  Analyze  Exome  capture  data 


The  final  submission  of  libraries  for  X10  sequencing  is  still  being  processed  however  all 
410  libraries  should  have  be  processed  within  a  week  or  two.  We  run  a  few  quality 
control  analyses  on  the  samples  once  they  have  gone  through  a  standard  pipeline  (this 
is  done  by  the  NYGC).  Concordance  analysis  looks  for  any  discrepancies  between  a 
"normal”  sample  and  its  paired  "tumor"  sample.  Pairs  generally  have  over  90% 
concordance,  however  this  analysis  has  proved  useful  as  it  identified  a  misread  tube 
label  and  thus  allows  us  to  correct  such  errors.  Where  samples  have  a  low  concordance 
that  can  not  be  corrected  easily  (ie.  We  have  no  other  way  to  identify  a  mis  labled 
sample)  they  are  unfortunately  put  to  one  side  for  the  time  being.  We  also  run  analyses 
for  "contamination"  this  could  be  from  the  tissue,  or  from  other  samples  in  the  library 
prep.  It  is  likely  that  the  lower  the  quality  or  quantity  then  the  higher  the  effects  of  any 
contamination  are  likely  to  be.  Samples  with  very  low  coverage  are  also  put  to  one  side. 
After  removing  samples  with  low  concordance,  low  contamination  and  low  sequencing 
coverage  we  currently  have  data  from  17  patients  and  165  libraries  (We  are  still  waiting 
on  analysis  on  an  additional  -100  samples  and  -37  patients. 

Initial  analysis  on  CNV  data  has  been  carried  out  on  11  patients  thus  far  and  shows  that 
there  are  both  similarities  and  differences  to  be  seen  between  DCIS  legions  within  the 
same  patient.  An  example  for  one  patient  is  below.  The  plot  shows  CNVs  that  are 
located  among  the  6  DCIS  samples  from  this  patient.  You  can  see  that  some  are  shared 
by  all  6  samples  (sample  number  on  the  Y  axis,  chromosome  number  along  the  X  axis) 
and  some  are  unique  to  just  one  or  two  samples. 


The  plot  below  shows  the  CNVs  that  are  shared  or  unique  between  the  DCIS  and  the 
I  DC  of  this  same  patient.  The  X  axis  represents  a  shared  region  (1  is  the  CNV  is  located 
in  only  a  dcis  sample,  or  just  an  I  DC  sample,  2  is  that  the  cnv  is  found  in  both  an  I  DC 
sample  and  a  dcis  sample). 
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We  are  currently  working  on  constructing  phylogenetic  trees  using  the  program  LICHEE, 
using  the  SNVs.  The  trees  are  fairly  complex  so  we  are  working  on  creating  a  simpler/ 
singular  tree  using  only  regions  of  2N  as  determined  by  the  CNV  profiles.  We  have  also 
worked  out  a  confidence  cut  off  for  the  SNVs  based  on  the  number  of  reads  per  SNV 
and  the  VAF  number.  We  have  made  this  fairly  stringent  to  minimize  "noise". 

In  addition  to  this  we  have  carried  out  work  on  the  mutational  signatures  of  these 
samples  to  look  for  characteristic  signature  patterns.  We  have  found  so  far  a  couple  of 
patients  with  the  APOBEC  signature,  see  figure  below. 


-U.l- 
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Signature. 1  :  0.084  &  Signature  2  :  0.298  S  Slgnsturo.8  :  0.308  8  Signature  IS  :  0.08  &  Signature  30  :  0.083 


Further,  more  indepth  analysis  will  be  carried  out  on  the  phylogeny  of  the  DCI S  and  I  DC 
legions  and  if  there  are  any  associations  between  the  differences  we  see  in  the  DNA 
data  and  the  differences  we  see  in  the  RNA  data.  This  is  being  carried  out  together 
with  the  bioinformaticians  at  the  NYGC  and  we  will  seek  further  analysis  from  groups 
here  at  Cambridge  who  specialize  in  tumor  evolution. 


Aim  2.  A  transcriptional  landscape  of  early  breast  cancer. 

Task  6.  Sample  choice  from  frozen  bank. 

-  choose  samples  for  pure  DCIS  and  DCIS  with  microinvasion/I  DC 
Task  7.  Laser  capture  of  frozen  samples  for  characterization 

We  have  received  146  samples  from  the  frozen  bank  now  and  have  processed  115  of 
these  so  far.  For  each  sample  the  following  regions  are  annotated  by  J  oe  (the 
pathologist)  and  dissected  in  triplicate  for  RNA:  DCIS,  I  DC,  normal  epithelium,  Athypical 
epithelium,  Solid  DCIS,  papillary  DCIS,  benign  epithelium,  areas  of  high  immune 
infiltration,  stroma  adjacent  to  DCI  S,  stroma  adjacent  to  I  DC  and  stroma  away  (as  far 
away  from  DCIS  or  I  DC  regions  as  possible).  This  has  provided  over  6300  legions.  This 
Task  is  still  on  going. 


Task  8.  RNAseq  library  construction 


Approximately  1300  RNA  seq  libraries  have  been  sequenced.  Currently  we  have  been 
focusing  on  DCIS  and  I  DC  and  normal  and  other  epithelium  and  are  prioritizing  these 
for  sequencing  now.  This  task  is  still  on  going. 


Task  9.  Analyze  RNAseq  datasets 

Thus  far  we  have  analyzed  1200  DCIS,  I  DC,  benign/normal  epithelium  and  stroma  away 
libraries.  For  quality  control  samples  with  a  Gene  Assignment  of  <  15%  with  %  of 
Uniquely  mapped  reads  <  20,  are  removed  from  the  group  analysis. 

We  have  carried  out  subtype  analysis  on  the  DCI S  and  I  DC  samples  that  we  have  data 
for  using  both  the  PAM50  and  the  AIMs  methods.  We  have  decided  to  use  the  output 
for  the  Al  Ms  method  rather  than  the  PAM50,  as  we  have  found  that  the  subtype  profiles 
tend  to  change  depending  on  which  samples  you  add  to  the  group.  For  the  Aims 
method,  this  does  not  happen  and  each  subtype  is  classified  based  only  on  the  data  for 
that  sample. 

Below  is  a  table  showing  some  patients  so  far  and  the  subtype  profile 
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We  have  found  that  the  subtype  of  the  I  DC  can  be  different  from  the  same  patient 
DCIS  legions,  as  show  below. 
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We  are  doing  some  preliminary  differential  analysis  on  the  RNA  seq  using  different 
groups  as  defined  by  both  the  subtyping  and  the  DNA  data. 

Opportunities  for  training  and  professional  development 

Nothing  to  report  (not  intended  for  training) 

Results  disseminated  to  communities  of  interest 

Nothing  to  report 
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